Effective Data Science Infrastructure by Ville Tuulos

Effective Data Science Infrastructure by Ville Tuulos

Author:Ville Tuulos [Tuulos, Ville]
Language: eng
Format: epub, mobi, pdf
Publisher: Manning Publications Co.
Published: 2022-07-08T22:00:00+00:00


Figure 5.9 Execution time vs. the number CPU cores in the multithreaded case

Figure 5.9 shows that running the algorithm with num_cpu=1 takes about 100 seconds for a version of the full matrix. For this dataset, the sweet spot seems to be at num_cpu=4, which improves performance by about 40%. Beyond this, the overhead of creating and aggregating per-thread output matrices overtakes the benefits of handling increasingly small input shards in each thread.

Summarizing the variants

This section illustrated a realistic journey of optimizing performance of a numerically intensive algorithm as follows:

First, we started with a simple version of the algorithm.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.